Theory-driven and Corpus-driven Computational Linguistics and the Use of Corpora

نویسنده

  • Stefanie Dipper
چکیده

Computational linguistics and corpus linguistics are closely-related disciplines: they both exploit electronic corpora, extract various kinds of linguistic information from them, and make use of the same methods to acquire this information. Moreover, both were heavily affected by "paradigm shifts" from the prevailing empiricism of the 1950s, to rationalism, then back again with a revival of empirical methods in the 1990s. Computational linguistics deals with the formal modeling of natural language. The formal models can be used to draw conclusions about the structure and functioning of the human language system. They also form the basis of implemented systems for the analysis and generation of spoken or written language, in a variety of applications. The methods applied in building these models are of different kinds since, as a result of the above-mentioned paradigm changes, work in computational linguistics has taken two different paths. Both branches of computational linguistics aim to build models of natural language, but each exploits different techniques: the rationalist's branch focuses on theory-driven, symbolic, nonstatistical methods, whilst the empiricist's branch focuses on corpus-driven and statistical techniques. As we will see later however, the distinction between the branches is these days less clear, and the two fields seem to be coming together again as people successfully combine concepts and methods from each field. Obviously, the corpus-driven branch of computational linguistics has a natural affinity to corpus linguistics, and a shared interest in corpus exploitation. As a consequence, many research topics can be attributed equally well to either computational linguistics or corpus linguistics; examples include part-of-speech tagging (see article 25), treebanking (article 17), semantic tagging (article 27), coreference resolution (article 28), to name just a few. At opposite extremes of computational and corpus linguistics, the ultimate goals of corpus exploitation do however diverge: certain domains of corpus-driven computational linguistics aim to build "optimal" models "no matter how", and the particular corpus features that find their way into such models are not seen as interesting per se; in contrast, corpus linguistics could be said to target exactly these features, the "ingredients" of the models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners

Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

A Comparative Study of Metaphorical Markers in Academic Research Articles

Although the use of metaphorical markers in corpora has been studied to a largeextent (e.g., Glucksberg & Keysar 1993; Skorczynska & Deignan, 2006; Sznjder,2005), no attempt to the best of the researchers' knowledge has been made todescribe metaphorical marking in a comparative analysis of 2 corpora in bothnational and international journals of applied linguistics in Iran. The gap envisagedhas ...

متن کامل

Recent grammatical change in English: data, description, theory

This chapter begins by considering the contrast between the data-driven paradigm characteristic of corpus linguistics and the theory-oriented paradigm characteristic of some other schools of linguistics, particularly those espousing a generative framework. To illustrate the corpus linguistics paradigm in detail, I present a case study of grammatical differences observed in the LOB and FLOB corp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007